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Abstract 

In highly distributed Internet measurement systems distributed agents periodicaUy measure 
the Internet using a tool called traceroute, which discovers a path in the network graph. Each 
agent performs many traceroute measurement to a set of destinations in the network, and thus 
reveals a portion of the Internet graph as it is seen from the agent locations. In every period 
we need to check whether previously discovered edges still exist in this period, a process termed 
validation. For this end we maintain a database of all the different measurements performed by 
each agent. Our aim is to be able to validate the existence of all previously discovered edges in 
y^ . the minimum possible time. 

In this work we formulate the validation problem as a generalization of the well know set 

cover problem. We reduce the set cover problem to the validation problem, thus proving that 

the validation problem is A/'P-hard. We present a 0(logn)-approximation algorithm to the 

validation problem, where n in the number of edges that need to be validated. We also show 

fvj ' that miless V — MV the approximation ratio of the validation problem is f2(logri). 
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f^ , Our problem arise in the context of highly distributed Internet measurement systems [6l|7j. In this 

type of systems, distributed agents periodically measure the Internet using a tool called traceroute, 
which discovers a path in the network graplu. Each agent performs many traceroute measurement 
to a set of destinations in the network, and thus reveals a portion of the Internet graph as it is seen 
5^ \ from the agent locations. While some edges can be seen from many measurement locations, others 

can be seen only from a handful locations [6l[71[T], which is the major reason for distributing this 
process. We create a periodic map by unifying the measurements made by all the agents over this 
period. 

There are many possible heuristics to direct agents to destinations in order to find as many 
graph edges as possible. However, one thing we have to do in every period is to check whether 
previously discovered edges still exist in this period, a process termed validation. For this end we 
maintain a database of all the different measurements performed by each agenllj. Our aim is to be 
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^The path can be expressed at various levels of abstraction. The most common level in use is the autonomous 
system (AS) level, where each node in the graph (and thus in the path) represent an AS (or a network) in the Internet. 
^The list is kept at the abstraction level we are interested in, e.g., at the AS level. 



able to validate the existence of all previously discovered edges in the minimum possible time. 

A solution to the validation problem is to model each tracroute measurement as a set of edges, 
and then look for the smallest group of traceroute measurements (the sets) that covers the known 
graph, e.g., using a set cover logarithmic approximation algorithms |3j. However, this solution may 
end up finding many groups which are measured by one agents while leaving other agents with little 
or no measurements to perform. Since all agents measure at roughly the same rate, the termination 
time of the validation task is determined by the time it will take the agent with the largest numbers 
of measurements to complete its task. Thus, our aim is not to minimize the number of measurement 
that cover the graph, but to minimize the maximal number of measurement which is assigned to 
the agent with the most measurements. Therefore reducing the validation problem to the set cover 
problem will not necessarily give us the best solution, so we describe the validation problem as a 
generalization of the set cover problem. 

Our Results. We define a new generalization of the set cover problem that is equivalent to 
the validation problem, and give an 0(logn)-approximation algorithm, where n is the number of 
edges in the validation problem, and show that our approximation ratio is tight, namely that our 
generalization of set cover cannot be approximated in polynomial time to within a factor of o(log n). 

Organization: In Section[2]we give notations and a formal definition of the problem. In Section[3] 
we present an 0(logn)-approximation algorithm for the generalized set cover problem, and prove 
that this ratio cannot be asymptotically improved. 

2 Preliminaries 

For an algorithm A, denote the objective value of a solution it delivers on an input /by A(/). An 
optimal solution is denoted by opt, and the optimal objective value is denoted by opt as well. 
The (absolute) approximation ratio of A is defined as the infimum p such that for any input /, 
A(7) </9-opt(7). 

Given a universe U = {mi,...,^^} and a family of its subsets, S = {Si, ..., Sk} Q P{U), 
Us eS'^j ~ ^ 1 ^^^ cover is the problem of finding a minimal sub-family 5 of 5 that covers the 
whole universe. Us es ^j — ^ • Set cover is a classic A/'P-hard combinatorial optimization problem, 
and it is known it can be approximated to within Inn — Inlnn + 0(1) [H [H |9]. By [5l [2] it follows 
that unless V = NV, there exists a constant < c < 1 so that set cover cannot be efficiently 
approximated to within any number smaller than clog2 n. 

We formalize the validation problem discussed in the introduction in the following manner: every 
edge in a traceroute is an element in a universe U . Each traceroute is modeled as a set of elements 
in [/ - its edges. Each agent is modeled as a family of sets, indicating the list of traceroutes it can 
perform. Moreover, each agent has a weight, indicating the number of traceroutes it can perform 
at a time period. Thus we get the following problem: 

Problem 1 Validation Set Cover - VSC Given a universe U of n elements, a collection of subsets 
of U , S = {S*!, ..., 5fc}, a partition of S tt = {Ai, ...,Am} where Ai C S, and a weight function 
u; : vr ^ N, find a subcollection S of S that covers all elements of U such that maxi<j<m \j(a\ 
is minimum. 



Note: the Validation Set Cover problem is indeed a generalization of the set cover problem - if 
7TT- = 1 then the Validation Set Cover problem is exactly the set cover problem. Thus the Validation 
Set Cover problem is also AfV-hard. 

3 An O(logn)- Approximation Algorithm 

In this section we give an approximation algorithm for the VSC problem with an approximation 
ratio of O(logn). We then show that this is the best ratio possible by showing a lower bound of 
O(logn) on the approximation ratio. 

The greedy strategy applies naturally to the VSC problem: iteratively for each 1 < i < m pick 
uj{Ai) sets in Ai that cover the maximum number of elements in U that are still uncovered. The 
algorithm stops when all the elements in U are covered, and outputs the number of steps preformed. 

Algorithm 1 Greedy VSC algorithm 

1. £^0 

2. C^^ 

3. while C y^U 

(a) e^e + 1 

(h) for 1 < i < m 

i. repeat uj{Ai) times 

A. find a set Sj such that Sj G Ai and Sj f) {U \C) is maximum. 

B. pick Sj 

c. c ^cuSj 

4- output i 

Theorem 1 Algorithm{l\ gives an approximation ratio of 0{logn). 

We next prove Theorem [H We first define the ^-residual VSC problem. The input to this 
problem is the input to the VSC problems after i steps of the algorithm, with the same objective 
function: 

• Let n£ be the number of elements in U that remain after i steps of the algorithm. For ^ = 

nc = n. 

• let Cg be the set of elements in U that are covered until step i, 

• for all 1 < j < k = \S\ 

- let S-j = Sj \ Ce, 

- for all 1 < i < m let Af = Ai\ {Sj G Ai\Sj has been picked until step £}, 

- let S' = {5j|5| ^ ^}. 



• for all 1 < i < m let io{Af) = id{Ai). 

• let OPT£ be the optimal solution of the residual input after i steps o 



Then OPT^ = min^f niaxi<j< 
of ?7 \ Ci. 



Thus we get the following claim: 



where S is a subcollection of S that covers all elements 



Claim 1 At step £ > 1 of Algorithm\^ at least Qp^^ — elements in U are covered. 

Proof: If £ = 1 then, since Algorithm [1] picks a set that covers the maximum number of elements, 
it holds that at least XWT ~ opi^^ elements are covered at step I. li i > \ then an optimal 
algorithm covers all the n^_i remaining elements oi U \ Ci^i in opt£_i steps. Since Algorithm [T] 
picks a set that covers the maximum number of remaining elements, it holds that at least Qp^^^ 
elements are covered at step i. M 

Using the above claim and the observation that for all I OPT^ < opt, we get the following 
lemma. 

Lemma 2 n, < n (l - qFt)'"' 

Proof: By induction on i: 

n ( 1 

n\<n = n 1 



OPT \ OPT 

n,<n,-^^<nfl-J-V^^<nfl-J-V "(^"^^ =nll 



OPTi \ OPTy OPTi \ OPTy OPT \ OPT 

Assume that for all i < ^ it holds that ni < n (l — opp) • Then 
ne < ni-i < n \ 1 



OPT^_l V OPTy OPT^_i 

-I ^n 1 \^-l 



1 y-' n(l 



< ^ i_^ _:ia: — oPT^ — = n{i-^\ (1) 

\ OPT / OPT V OPT ' 



Proof of Theorem [Tl In the worst case the algorithm stops after £ + 1 steps for the minimal i 
such that nn < 1. Since by the above lemma ni < n i^l — opt) ' ^'-'^ ^ ^°^ which n[l — opp) < 1 
it holds that n^ < 1. 



nil — 1 < 1-^(1 ) <- 

OPT/ \ OPTy n 

^ £^ log(l/n) _ logn 



log (1 - oft) log (og^ 

^ i< /"'" ^. (2) 

log(l + opT^ 



^Recall that opt is the optimal solution 
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We now prove that — j — ^^^ r- < log n • OPT. It holds that 



log n , 1 1 /OPT 



According to Taylor series have that 

where 

for some < c < x. For f{x) = e^ we get that 
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e"=> - + e^ 



i! (n + 1)!' 

i=0 \ ' J 



for some < c < x. For x = 1/opt and n = 2 we get that 



;l/OPT = 1 + J_ + ^JL^ + 



OPT 20Pt2 60Pt3 ' 



for some < c < 1/opt. Now, 



1 1 1 e^ 

1 + 7 > 1 + + 7^ o + 



OPT — 1 OPT 20Pt2 60Pt3 

1 1 1 

^ 7 > 7^ T + 



OPT — 1 OPT 20Pt2 60Pt3 

1 1 e" 

^ -, -r. > T. ^ + 



(opt — l)OPT 20Pt2 60Pt3 
1 1 e^ 

^ 7 > ^ + 



OPT — 1 20PT 60Pt2 
<^ eoPT^ > (0PT-l)(30PT + e'=). (3) 



The last inequality is vahd since e"^ < 3 (as c < 1/opt). Thus 1 + Qp;!p_-^ > e^/*-'^'^, 



so 



logra 



r < logn • OPT. Therefore the number of steps used by Algorithm [T] is at most 
log(^l+^5PTTTJ 
1 + logn • OPT, and the theorem follows. H 

By [5l [2] it follows that unless V = MV the approximation ratio of the set cover problem is 
n(logn). Since for m = \ and for 1 < i < m oo{Ai) = 1 the VSC problem is exactly the set cover 
problem, we get that unless V = MV the approximation ratio of the VSC problem is r2(logn). 
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